X Variables

Week 10

Evie Zhang

Old Dominion University

Topics

  • Exam 2
  • Spurious Regression
  • Cointegration

Exam 2

X predicts Y

Thus far, we have only used past \(Y\) values to forecast future \(Y\) values.

  • Of course, other things (besides \(e\)) can effect \(Y\)

In forecasting, we call these leading indicators

  • Example: housing starts, building permits, Starbucks (?)

Yield Curve

Interest rates denote the risk of an assets.

  • higher rates \(\implies\) greater risk

Yield Curve

  • Length on the X axis, Yield on the Y axis.
  • Should be an upward slope.

An inverted curve has correctly predicted the past 7 recessions.

Yield Curve

Yield Curve

Regressing Y on X

\[\% \Delta PCE_t = f(\% \Delta \text{Income}_t, \% \Delta \text{Savings}_t, \Delta \text{U-Rate}_t) + e_t\]

Code
macro <- read.csv("../data/macro.csv")
macro$DATE <- ymd(macro$DATE)

Regressing Y on X

Code
macrots <- ts(macro, start = c(min(year(macro$DATE)), 1, 1))
acf(macrots[,2:4])

Regressing Y on X

Regressing Y on X

To forecast using this type of model, you will need “future” values of your explanatory variables.

  • This is easy for some thing, and difficult for other
  • Vector Autoregression (VAR)
  • Impulse Response Functions (IRF)

Regressing Y on X

OK, so estimate \(Y_t = f(X_t, e_t)\)!

\[\Delta GDP_t = \alpha + \beta \Delta GDP_{t-1} + \gamma \text{Spread}_{t-1} + e_t\]

But first … what about stationarity?

Spurious Regression

Suppose you have two independent time series \(y_t\) and \(x_t\).

  • If you regress \(y_t\) on \(x_t\) (\(y_t = \alpha + \beta x_t + e_t\)), what should \(\hat{\beta}\) be?

Spurious Regression

Code
tz <- c()

for(j in 1:1000){
  
  x <- rnorm(120)
  y <- rnorm(120)
  
  tz[j] <- summary(lm(y ~ x))$coefficients[2,3]
}

plot(table(round(tz, 1)),
     ylab = "Frequency",
     xlab = paste0(round(100*mean(abs(tz) > 1.96)),
                   "% Significant"))
abline(v = 0, col = "tomato")

Spurious Regression

Code
for(j in 1:1000){
  
  x <- rep(0, 120)
  y <- rep(0, 120)
  
  for(i in 2:120){
    
    x[i] <- x[i-1] + rnorm(1)
    y[i] <- y[i-1] + rnorm(1)
  }
  
  tz[j] <- summary(lm(y ~ x))$coefficients[2,3]
}

plot(table(round(tz, 0)),
     ylab = "Frequency",
     xlab = paste0(round(100*mean(abs(tz) > 1.96)),
                   "% Significant"))
abline(v = 0, col = "tomato")

Spurious Regression

This happens when you regress two random walks on one another.

Code
par(mfrow = c(1, 2))
plot(x, type = "l", col = alpha("tomato", .6),
     xlab = "Time", ylab = "",
     ylim = range(x, y))
lines(y, col = alpha("dodgerblue", .6))
plot(x, y, pch = 19, col = alpha("black", .6),
     xlab = "X", ylab = "Y")
abline(lm(y ~ x), lty = 2)

Spurious Regression

Code
reg1 <- lm(y ~ x)
t <- 2:length(y)
reg2 <- lm(y[t] ~ x[t] + y[t-1])
stargazer(reg1, reg2, type = "html")
Dependent variable:
y y[t]
(1) (2)
x 0.818***
(0.074)
x[t] 0.083*
(0.043)
y[t - 1] 0.905***
(0.038)
Constant 2.493*** 0.238*
(0.221) (0.132)
Observations 120 119
R2 0.511 0.918
Adjusted R2 0.506 0.916
Residual Std. Error 2.424 (df = 118) 1.000 (df = 116)
F Statistic 123.086*** (df = 1; 118) 647.680*** (df = 2; 116)
Note: p<0.1; p<0.05; p<0.01

Spurious Regression

Warnings:

  1. Check stationarity of the residuals
    • Regression is spurious if residuals have a unit root.
    • tseries::adf.test(lm(y ~ x)$residuals)
  2. Results change dramatically if lagged \(y\) is included
  3. If spurious, make stationary!

Spurious Regression

Code
adf.test(reg1$residuals)
adf.test(reg2$residuals)

    Augmented Dickey-Fuller Test

data:  reg1$residuals
Dickey-Fuller = -3.0513, Lag order = 4, p-value = 0.1399
alternative hypothesis: stationary


    Augmented Dickey-Fuller Test

data:  reg2$residuals
Dickey-Fuller = -3.9078, Lag order = 4, p-value = 0.01613
alternative hypothesis: stationary

Integration

Time series with unit roots are said to be integrated or \(I(1)\)

  • \(I(1)\) is unit root
  • \(I(0)\) is stationary
  • \(I(d)\) is \(d\)’th difference stationary
    • Hansen says \(I(2)\) series are price levels and money supply

Money Supply

Money Supply

Cointegration

Two series are cointegrated if a linear combination of them has a lower level of integration.

  • If \(Y\) and \(X\) are \(I(1)\), but \(Y - \theta X\) is \(I(0)\), they are cointegrated.
  • Also consider two proportional time series.
    • If \(\frac{Y}{X}\) is stationary, then \(log(Y) - \theta log(X)\) is \(I(0)\)

Another way to think about it: two series are cointegrated if there is a (relatively) constant distance between the two.

Cointegration

oldladyRandomWalk.gif (14623 bytes)     

boy_randomWalk.gif (976 bytes)

oldman_dog.gif (11472 bytes)

Cointegration

Again, \(y\) and \(x\) are coinegrated if:

\[z_t = y_t - \theta x_t \ \ \text{is} \ \ I(0)\]

\(\theta\) is sometimes known from theory

  • In the case of term spread, it is \(1\).
  • This can also be estimated.

Cointegration

Cointegration

Code
pi <- read.csv("../data/inc_pce.csv")
pi$DATE <- ymd(pi$DATE)
par(mfrow = c(1, 2))
plot(pi$DATE, pi$PI, type = "l",
     ylim = range(pi$PI, pi$PCE),
     xlab = "Date", ylab = "log($)",
     col = "tomato")
lines(pi$DATE, pi$PCE, type = "l", col = "dodgerblue")
legend("topleft", legend = c("PI", "PCE"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

plot(pi$DATE, log(pi$PI), type = "l",
     ylim = log(range(pi$PI, pi$PCE)),
     xlab = "Date", ylab = "log($)",
     col = "tomato")
lines(pi$DATE, log(pi$PCE), type = "l", col = "dodgerblue")
legend("topleft", legend = c("PI", "PCE"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

Cointegration

Code
reg <- lm(log(PCE) ~ log(PI), data = pi)
summary(reg)

Call:
lm(formula = log(PCE) ~ log(PI), data = pi)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.059112 -0.014470 -0.003781  0.016900  0.057097 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.3604532  0.0056156  -64.19   <2e-16 ***
log(PI)      1.0134213  0.0006801 1490.18   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02196 on 730 degrees of freedom
Multiple R-squared:  0.9997,    Adjusted R-squared:  0.9997 
F-statistic: 2.221e+06 on 1 and 730 DF,  p-value: < 2.2e-16

Cointegration

Code
plot(pi$DATE,
     reg$residuals,
     type = "l")

Cointegration

Code
tseries::adf.test(reg$residuals)

    Augmented Dickey-Fuller Test

data:  reg$residuals
Dickey-Fuller = -2.9276, Lag order = 9, p-value = 0.1857
alternative hypothesis: stationary

Cointegration

Cointegration

Does the comovement of the 10-Year and 3-Month indicate a spurious relationship or cointegration?

  • They are both certainly non-stationary.

Cointegration

Code
fred <- read.csv("../data/spread.csv")
fred$DATE <- ymd(fred$DATE)
fred <- fred[fred$DGS10 != ".",]
fred <- fred[fred$DATE < ymd("2014-05-01"),]
colnames(fred) <- c("DATE", "t3m", "spread", "t120m")

for(i in 2:4) fred[,i] <- as.numeric(fred[,i])

reg <- lm(t120m ~ t3m, data = fred)
summary(reg)

Call:
lm(formula = t120m ~ t3m, data = fred)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1886 -0.9453 -0.0241  0.9226  3.5382 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.48633    0.08521   29.18   <2e-16 ***
t3m          0.80997    0.01450   55.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.123 on 626 degrees of freedom
Multiple R-squared:  0.8329,    Adjusted R-squared:  0.8326 
F-statistic:  3120 on 1 and 626 DF,  p-value: < 2.2e-16

Cointegration

The residuals from this regression are \(z_t\)!

Code
plot(fred$DATE, reg$residuals, type = "l",
     xlab = "", ylab = "")

Cointegration

Code
tseries::adf.test(reg$residuals)

    Augmented Dickey-Fuller Test

data:  reg$residuals
Dickey-Fuller = -3.7392, Lag order = 8, p-value = 0.02204
alternative hypothesis: stationary

Cointegration

What does this have to do with spurious regression?

  • Stationary: use raw series
  • Non-Stationary: use differenced series
  • Non-Stationary & Cointegrated: use differenced series \(+\) error correction

Cointegration

Error Correction Model: the change of one series is explained in terms of:

  • the lag of the difference between the series
  • lags of the differences of each series

\[\begin{align}\Delta y_t &= \Sigma_{i = 1}^{p} \alpha_{i} \Delta y_{t-i} \\ &+ \Sigma_{j = 1}^{q} \beta_{j} \Delta x_{t-j} \\ &+ \gamma z_{t-1} + \mu + e_t\end{align}\]

Cointegration

\(z_t\) is the residual from the \(Y\) on \(X\) regression.

\[Y_t = \alpha + \beta X_t + z_t\]

If \(z_t\) is stationary, then \(Y\) and \(X\) are cointegrated.

Cointegration

Code
t <- 2:nrow(fred)
d_t3m <- fred$t3m[t] - fred$t3m[t-1]
d_t120m <- fred$t120m[t] - fred$t120m[t-1]

t <- 2:length(d_t3m)
d_t3m_lag <- d_t3m[t-1]
d_t120m_lag <- d_t120m[t-1]
d_t3m <- d_t3m[t]
d_t120m <- d_t120m[t]

resid <- reg$residuals[2:(nrow(fred)-1)]
summary(lm(d_t120m ~ d_t120m_lag + d_t3m_lag + resid))

Call:
lm(formula = d_t120m ~ d_t120m_lag + d_t3m_lag + resid)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7033 -0.1464 -0.0163  0.1495  1.4264 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.001580   0.011021  -0.143  0.88607    
d_t120m_lag  0.346914   0.046645   7.437 3.42e-13 ***
d_t3m_lag   -0.051878   0.030530  -1.699  0.08977 .  
resid       -0.029292   0.009943  -2.946  0.00334 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2757 on 622 degrees of freedom
Multiple R-squared:  0.1081,    Adjusted R-squared:  0.1038 
F-statistic: 25.12 on 3 and 622 DF,  p-value: 2.364e-15